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1. 


EXECUTIVE  SUMMARY 


This  report  summarizes  the  work  done  by  MIT  in  Year  2(1  February  1997  through  31 
January  1998)  of  the  ONR  Grant  N00014-1-96-0379  entitled  “Training  Spatial  Knowledge 
Acquisition  Using  Virtual  Environments.”  It  has  been  written  by  Dr.  Thomas  E.  v.  Wiegand, 
Nathaniel  Durlach  (PI),  Glenn  Koh,  S.  Brian  Fitch,  and  Rebecca  Lee  Garnett. 

During  this  second  year  of  the  program,  we  have  continued  along  the  line  of  development 
charted  during  Year  1.  We  have  completed  the  preliminary  experiment  on  the  acquisition  of 
spatial  knowledge  by  means  of  brief  training  in  a  number  of  VE-based  simulations,  and  begun 
analysis  of  the  comparative  effectiveness  of  immersive,  non-immersive,  and  3D  model-based 
virtual  environments  in  training  configurational  knowledge  of  a  small  architectural  space.  We 
have  completed  initial  development  and  testing  of  a  small,  cost-effective  prototype  locomotion 
interface  device,  the  “finger  walker,”  to  enable  more  natural  locomotion  in  virtual  environments 
than  is  possible  through  use  of  a  joystick.  In  addition,  we  have  continued  work  on  the 
development  of  a  room-scanning  robot,  a  device  for  collecting  texture-map  data  from  large 
complex  venues.  Finally,  our  experience  of  the  difficulty  entailed  in  developing  photorealistic 
models  of  a  complex  large-scale  environment  has  also  led  us  to  begin  work  on  the  development 
of  a  new  VE  construction  system  allowing  us  to  import  two-dimensional  building  floorplan  files 
and  to  automatically  attach  textures  to  walls,  doors,  windows*  and  other  objects  within  the  virtual 
building. 

. The  body  of  this  report,  which  is  devoted  to  describing  the  above-mentioned  work,  is 

supplemented  by  two  theses  (Appendices  1  and  2)  related  to  this  work. 
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2. 


PROGRESS  REPORT 


Our  work  during  Y2  can  be  summarized  under  the  following  headings:  (1)  Experiment  on 
spatial  knowledge  acquisition  in  an  architectural  space;  (2)  Development  of  a  VE-based 
navigational  interface;  and  (3)  Development  of  a  room  scanner. 

- DurmgY-1  of  this  grants  we  initiated  development  ef-facilities  (mainly  software)  and 

selection  of  experimental  venues  and  procedures  to  study  VE-assisted  training  of  spatial 
behavior.  Although  we  initially  planned  to  conduct  experiments  in  Y2  in  a  complex  architectural 
space  (a  four-story,  89,000  sq.  ft.  warehouse  containing  lots  of  discarded  furniture  and 
equipment),  we  soon  realized  when  attempting  to  model  this  space  that  our  facilities  were  totally 
inadequate  for  such  an  effort.  This  realization  led  us  to  (1)  emphasize  more  heavily  experimental 
work  in  simpler  venues  (see  Sec.  2.1  below)  and  (2)  devote  serious  attention  to  the  development 
of  improved  methods  for  constructing  VE  models  of  architectural  spaces  (see  Sec.  2.3  below). 


2.1  Experiment 

A  preliminary  experiment  was  designed  at  the  end  of  Year  1  for  the  purpose  of  testing 
our  proposed  processes  for  VE  modeling  as  well  as  our  experimental  procedure^  The  goal  of  this 
experiment  was  to  establish  the  efficacy  of  virtual  environment  technology  as  a  training  medium 
for  spatial  knowledge  acquisition,  and  to  determine  the  extent  to  which  configurational 
knowledge  of  a  space  can  be  acquired  through  the  process  of  exploring  a  high-resolution  photo¬ 
realistic  VE  simulation  of  an  actual  space.  Completion  of  this  preliminary  experiment,  and  the 
analysis  of  data  gathered  from  its  participants  constituted  the  principal  work  accomplished  in  this 
area  during  Y ear  2. 

2.1.1  Method.  As  described  in  our  Year  1  report,  the  venue  selected  for  this  preliminary 
experiment  consisted  of  a  portion  of  the  seventh  floor  of  MIT  building  36,  a  site  which  had 
previously  been  used  for  the  low-end  feasibility  study  described  in  Sec.  2.3.3  of  the  Year  1 
report.  A  VE  model  of  this  venue  was  developed  using  the  same  hardware/software  facilities  as 
those  intended  for  use  in  creating  the  VE  model  of  the  large-scale  warehouse.  The  test  task 
employed  was  intended  to  evaluate  subjects’  configurational  knowledge  of  the  space  after  a  brief 
period  of  training  in  one  of  four  conditions:  RW  (Real  World);  VE  (Immersive  VE);  NVE 
(Non-Immersive  VE);  and  Mod  (Model).  Following  training  in  one  of  these  conditions,  the 
subjects  were  brought  to' the 'actual  venue,  positioned  at  each  of  four  given  reference  points 
within  the  real  space  (stations),  and  asked  to  estimate  the  location  of  a  number  of  landmarks 
within  the  real,  space  by  reporting  the  azimuth  of  each  landmark  and  its  range  relative  to  the 
location  of  the  station. 

-  - . For  all  training  conditions,  subjects  were  informed  in  advance  about  the  task  they  would 

be  asked  to  perform  at  the  completion  of  the  training  period.  Training  included  10  minutes  of 
free  exploration  under  the  conditions  of  the  specific  training  method.  For  those  in  the  RW 
condition,  a  10  minute  self-guided  exploration  of  the  actual  space  was  the  only  training  provided. 
In  the  other  cases  (VE,  NVE  and  Mod),  subjects  were  given  a  brief  opportunity  to  familiarize 
themselves  with  the  training  technology  to  be  used  prior  to  the  beginning  of  the  10  minute 
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training  period,  during  which  the  VE  representation  of  the  actual  space  was  provided.  Subjects 
were  not  informed  of  the  specific  landmarks  they  would  be  asked  to  identify  during  the 
subsequent  testing  phase  of  the  experiment,  but  were  told  to  note  the  relative  spatial 
configuration  of  the  venue,  rather  than  to  try  to  memorize  the  characteristics  of  individual  objects 
in  the  space. 

—  —  •  In  the  VE  condition,  subjects  used  a  headmounted  display  (HMD)  to  view  the 
representation  of  the  space,  and  a  joystick  for  a  first-person  simulated  walkthrough  within  it. 

The  NVE  condition  used  the  same  equipment  and  procedures,  except  that  the  representation  was 
viewed  on  a  21”  monitor  rather  than  through  an  HMD.  In  the  Mod  condition,  subjects  viewed  on 
the  monitor  a  miniature  exocentric  3D  model  of  the  space,  essentially  a  3D  equivalent  to  a  map 
of  the  space,  that  could  be  manipulated  using  a  mouse.  This  model  allowed  for  a  change  in  the 
subjects’  point  of  view  as  well  as  the  opportunity  for  viewing  “inside”  the  space  (as  though  the 
ceiling  of  the  space  had  been  removed). 

2.1.2  Equipment.  The  VE  walkthroughs  used  in  this  study  were  run  on  modified  Easyscene 
software  from  Coryphaeus  running  on  an  SGI  Onyx  RE/2.  The  Onyx  was  equipped  with  two 
R4400  processors  operating  at  150  Mhz  and  128  MB  of  RAM.  The  Easyscene  software  was 
modified  to  allow  for  a  first-person  walkthrough  through  a  developed  model  with  collision 
detection  and  joystick  support  added;  Modifications  were  also  made  to  enable  support  for  the 
headmounted  display  and  adaptation  of  multiple  viewpoints  and  control  methods  for  immersive, 
non-immersive,  and  exocentric  experimental  conditions. 

The  headmounted  display  used  was  a  Virtual  Research  VR4.  The  device  has  a  horizontal 
resolution  of  350  lines,  a  vertical  resolution  of  230  lines,  and  a  60  degree  field  of  view.  It  was 
not  stereo-enabled  for  this  study.  Orientation  of  the  HMD  was  determined  by  an  attached 
Polhemus  3  Space  Fastrak  sensor  which  provided  orientation  and  position  information  to  the 
walkthrough  software  for  translation  to  a  properly  corresponding  image  in  the  HMD.  Positional 
translation  was  accomplished  through  the  use  of  a  joystick. 

Estimation  of  azimuth  during  the  testing  phase  of  the  experiment  was  accomplished  using 
a  pointer  attached  to  a  tripod-mounted  protractor.  Subjects  were  transported  from  station  to 
station  in  a  wheelchair,  with  their  eyes  blindfolded  to  prevent  the  possibility  of  learning 
contamination  from  visual  cues  other  than  those  available  at  the  testing  stations.  Landmarks 
selected  as  pointing  targets  were  obscured  from  the  subjects’  view  by  the  interposition  of  walls 
and  doors.'  . . 

The  architectural  model  of  the  space  was, constructed,  using. Coryphaeus, Desi gners’ _ 

Workbench  software,  from  blueprints  of  MIT  Building  36.  Objects  within  the  space  were  fully 
texture  mapped  from  within  Designer’s  Workbench,  with  textures  obtained  by  photography  with 
both  a  Nikon  N6006  camera  with  film  scanned  at  1280x1024  by  Konica,  and  a  Kodak  DC20 
Digital  Science  digital  camera.  Texture  maps  were  edited  using  Kodak  PhotoEasy  software  and 
Adobe  Photoshop,  and  imported  into  Designers’  Workbench  for  application  to  the  building 
model. 

2.1.3  Subjects.  For  this  experiment,.36  subjects  were  recruited  from  the  student  population  of 
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MIT.  None  had  prior  experience  of  the  experimental  venue.  All  were  compensated  for  their 
time,  and  all  signed  the  COUHES  agreement  on  the  use  of  human  subjects.  Each  was  randomly 
assigned  to  one  of  the  four  training  conditions.  Following  the  experiment,  each  was  asked  to  fill 
out  two  surveys,  one  on  the  virtual  environment  and  one  on  the  experience  of  immersion. 

Subjects  whose  bearing  estimates  were  more  than  90  degrees  offset  from  the  actual  target  were 
considered  to  be  disoriented  and  their  results  were  excluded  from  the  analysis.  Each  subject’s 
angular  and  range  estimation  was  tested  at  the  first  experimental  station  by  asking  him/her  to 
point  at  an  object  that  could  be  seen;  these  estimates  were  later  used  for  calibration  purposes.  No 
correct-answer  feedback  was  given  at  any  time  during  the  testing  phase. 

2.1.4  Results.  The  most  accurate  estimates  of  the  bearing  to  objects  seemed  to  come  from  the 
groups  trained  in  the  non-immersive  Virtual  Environment  (NVE)  and  Model  conditions,  with 
averaged  mean  errors  of  10.07  (1.76)  and  10.11  (1.94)  degrees  respectively.  The  groups  trained 
in  the  Real  World  (RW)  and  (Virtual  Environment)  conditions  had  error  scores  of  1 1.92  (1.58) 
and  12.18  (1 .69)  respectively. 

In  the  estimation  of  distance  to  objects,  again  the  Model  and  NVE  groups  seemed  to  fare 
best,  with  averaged  mean  errors  of  19.46  (4.5)  feet  and  19/36  (4.97)  feet,  followed  by  the  VE  and 
RW  conditions  at  24.76  (4.1)  feet  and  33.61  (4.06)  feet  respectively. 

Bearing  and  distance  estimates  from  each  subject  were  combined  with  the  position  of  the 
initial  pointing  station  to  derive  the  cartesian  coordinates  of  the  subjects’  estimated  location  of 
the  landmark  objects.  As  before,  the  Model  and  NVE  conditions  fared  best,  with  average  mean 
errors  of  25.49  (5.15)  feet  and  27.48  (5.69)  feet,  followed  by  VE  and  RW  conditions  at  33.14 
(4.77)  feet  and  40.29  (4.64)  feet  respectively. 

The  magnitude  of  the  error  vector  can  be  used  as  an  indicator  of  the  ability  of  the  subject 
to  pointto  a  specified  object.  Error  values  from  each  pair  of  station  and  target  were  found  for 
each  subject,  then  averaged  of  all  subjects  in  their  respective  experimental  conditions.  By 
combining  bearing  and  distance  tasks,  the  results  should  give  an  indication  of  the  subjects’ 
knowledge  of  the  configuration  of  the  space.  In  this  analysis,  those  who  trained  in  the  RW 
condition  performed  most  poorly,  while  the  NVE  and  Model  conditions  performed  similarly 
well.  Subjects  in  the  RW  and  VE  conditions  performed  especially  poorly  when  the  specified 
objects  were  located  at  a  long  distance  from  the  station. 

An  ANOVA  for  bearing  error  (obtained  F  of  1.22;  criterion  F  of  2.37)  indicates  that  there 
is  not  enough  evidence  to  conclude  that  any  of  the  training  conditions  had  an  effect  on  the 
performance  of  the  bearing  estimation  test..  .  Thjs  does  not  indicate  that  there  are  no  differences  in 
the  training  methods  with  regards  to  this  test,  but  that  the  variance  between  the  different  training 
conditions  is  not  sufficiently  larger  than  the  variance  within  training  conditions. 

An  ANOVA  for  mean  distance  error  (obtained  F  7.02;  criterion  F  of  2.37)  indicates  that 
there  is  some  correlation  between  the  training  conditions  and  performance  in  the  distance 
estimation  task.  With  regard  to  this  task,  the  RW  training  condition  gave  particularly  poor 
results,  with  the  NVE  and  Model  conditions  faring  well. 
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The  cartesian  localization  analysis  of  the  bearing  and  distance  estimation  tasks  is  the 
metric  which  perhaps  comes  closest  to  the  idealized  measure  of  spatial  knowledge:  it  indicates 
how  well  the  subjects  learned  the  spatial  locations  of  the  targeted  landmarks.  ANOVA  results  in 
an  Obtained  F  of  5.25  and  Criterion  F  of  2.37,  again  indicative  of  a  positive  correlation  between 
training  method  and  test  performance. 

2.T-.-5 — Dtscussioft.'  That  the  best  results  in  terms  of  accuracy  of  bearing  and  distance  estimation 
came  from  the  group  trained  with  the  virtual  model  of  the  space  is  not  surprising,  given  the  fact 
that  map  training  can  be  highly  effective,  and  the  Model  provides  the  advantages  of  a  3D  map 
plus  photorealistic  rendering  of  the  venue.  In  this  respect  it  is  analogous  to  the  WIM,  or  World- 
in-Miniature  (Stoakley,  Conway,  Pausch  1995),  which  may  perhaps  provide  better  training 
results  than  training  in  a  fully  immersive  virtual  environment.  Subjects  trained  in  the  Model 
condition  had  access  to  the  same  realistic  rendering  of  the  virtual  venue  as  did  those  trained  in 
the  other  simulation  groups  (VE  and  NVE),  but  were  not  limited  to  the  egocentric  walkthrough 
method  of  learning  the  space;  rather,  they  could  acquire  their  knowledge  of  the  space  from 
alternative  and  manipulable  points  of  view. 

The  relatively  poor  performance  of  those  trained  in  the  Real  World  condition  can  be 
explained  in  a  number  of  ways.  First,  it  is  possible  that  the  level  of  detail  in  the  real  world,  being 
greater  than  that  which  could  be  rendered  in  any  of  the  virtual  representations,  proved  to  be 
distracting  during  the  period  when  the  subjects  were  attempting  to  learn  the  configuration  of  the 
space.  Second,  it  is  possible  that  those  who  trained  in  a  computer-generated  condition  had  the 
advantage  of  a  greater  degree  of  interest  in  the  task  than  those  who  simply  had  10  minutes  to 
walk  around  and  look  at  the  actual  space. 

The  advantage  of  the  NVE  condition  over  the  VE  condition  can  perhaps  be  explained  by 
the  better  resolution,  refresh  rate,  and  color  provided  by  the  monitor  as  compared  to  the  HMD,  as 
well  as  by  the  subjects’  relative  unfamiliarity  with  the  HMD  equipment  compared  with  their 
previous  experience  using  a  monitor  and  joystick  combination. 

The  primary  purpose  of  this  study  was  to  determine  whether  adequate  knowledge  of  an 
architectural  venue  can  be  obtained  through  training  in  a  virtual  simulation  rather  than  in  the  real 
world  venue.  Results  show  that  this  is  indeed  the  case,  since  the  groups  trained  in  the  computer 
simulations  of  the  space  performed  as  well  or  better  than  those  trained  in  the  real  world  on  the 
measures  chosen  for  comparison.  However,  the  comparisons  among  the  various  forms  of  VE 
training  were  less" conclusive.  The  exocentric  model  training  condition  performed  well,  as  did 
the  non-immersive  VE  condition.  ANOVAs  showed  a  high  variance  in  the  RW  and  VE 
conditions  and  a  relatively  lower  variance . and. degree  of  error  in.the  NVE  and  Model  conditions, 
even  though  the  same  model  was  used  in  both  conditions.  This  suggests  that  there  may  be 
independent  advantageous  factors  in  the  NVE  and  Model  training  conditions  which,  if  combined, 
could  lead  to  the  development  of  a  highly^ffective  computer-based  virtual  environment  training 
method. 

While  the  intersubject  variance  proved  to  be  large,  it  was  shown  in  this  study  that  a 
virtual  environment  training  system  can  be  as  effective  as  a  real-world  experience  with  respect  to 
the  localization  of  landmarks.  With  a  baseline  established  and  with  the  development  of  a 
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runtime  virtual  walkthrough  system,  it  may  be  possible  to  focus  in  the  future  on  developing  and 
testing  training  techniques  and  establishing  their  efficacy  in  the  training  of  spatial  knowledge 
acquisition. 

For  additional  details  on  this  experiment  and  its  results,  see  Appendix  1  (Koh,  1997). 


2.2  Locomotion  Interface  Development 

One  important  aspect  of  a  VE  training  program  designed  to  facilitate  the  acquisition  of 
knowledge  about  a  space  involves  providing  the  user  with  a  plausible  means  of  moving  about 
within  the  virtual  environment.  As  planned  during  Year  1,  our  research  program  in  Year  2  has 
focused  on  the  development  and  testing  of  an  inexpensive  interface  that  allows  a  user  to  “finger 
walk,”  or  simulate,  by  means  of  finger  motion,  the  activity  of  walking  within  and  through  a 
virtual  environment.  While  there  seems  to  be  some  direct  relationship  between  the  development 
of  spatial  knowledge  and  the  amount  and  type  of  effort  expended  in  moving  from  place  to  place 
(as  one  expends  effort  while  walking  in  the  real  world),  the  choice  of  a  motion-control  interface 
for  exploration  within  virtual  environments  may  be  constrained  by  factors  such  as  cost.  The 
interface  which  we  have  designed  as  part  of  our  Year  2  work  is  one  which  operates  within  the 
well-known  “  walking  metaphor”  of  motion  control,  making  use  of  a  low-friction  pad  that  allow s 
the  user  to  “walk  in  place”  by  means  of  moving  his  fingers,  and  an  electric  field  sensing  system 
that  monitors  the  position  of  the  fingers  on  the  pad.  The  interface  effectively  tracks  the  user’s 
movement  along  the  surface  of  the  pad  for  input  into  the  virtual  environment. 

The  potential  benefits  of  this  work  are  twofold.  First,  it  is  possible  that  many  of  the 
expected  advantages  of  a  full-scale  walking  interface  can  be  realized  in  a  more  cost-effective 
manner  by  means  of  a  scaled-down,  finger-walking  interface.  Secondly,  the  experience  gained 
in  developing  such  a  finger-walking  interface  "using  a  slippery  pad  may  be  useful  for  subsequent 
work  on  a  slippery-floor  walking  interface. 

The  finger  walking  device,  as  developed  and  described  in  Appendix  2,  is  an  inexpensive, 
compact,  easy-to  use  interface  for  providing  locomotion  within  virtual  environments.  The 
operator  uses  a  natural  walking-like  motion-with  fore  and  middle  fingers,  with  minimal 
equipment  attached  to  his  body.  The  input  to  the  user  interface  is  a  tracking  of  the  change  in  the 
electric  field  created  by  the  user’s  fingers.  The  output  to  the  virtual  environment  from  the  finger 
waller  is  a  velocity  vector,  consisting  of  a  magnifude'arid"  a  direction. 

The  .operation i  of the  interface  is  easy  and  straightforward.  First,  the  user  sits  down_at  the 
computer,  workstation  or  other  setup  for  viewing  the  virtual  environment.  The  user  then  attaches 
transmitter  electrodes  to  his  fingers  for  tracking.  Next,  the  user  places  an  HMD  on  his  head,  or 
position  himself  before  a  standard  computer  monitor,  to  view  the  virtual  environment.  Finally, 
the  user  places  his  fingers  on  the  finger  walker  pad  and  begins  moving  his  fingers  in  a  walking¬ 
like  motion.  The  finger  walker  and  the  virtual  environment  software  perform  the  calculations 
which  update  the  position  of  the  user  in  the  virtual  environment.  The  user  interface  consists  of 
five  distinct  stages  of  operation:  signal  detection,  data  acquisition,  translation,  special  operation 
instructions,  and  velocity  computation.  Hardware  systems  detect  the  electric  field  and  send  the 
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data  to  a  computer  for  processing.  The  finger  walker  software  package  then  manipulates  the 
electric  potential  received  from  an  analog  to  digital  card  to  compute  a  velocity  vector. 

During  operation,  the  coordinates  of  the  user’s  fingers  and  the  velocity  of  the  user 
through  the  virtual  environment  is  displayed  within  an  on-screen  window,  which  also  provides  a 
graphical  position  map  and  a  directional  compass.  A  separate  window  tracks  the  movement  of 
die  user  through  the  virtual  environment  by  drawing  a  line  along  the  path  of  the  user  as 
determined  by  the  magnitude  and  direction  variables. 

Initial  tests  of  the  interface  have  provided  evidence  to  support  the  electric  field  proximity 
sensor  as  an  efficient  method  by  which  to  track  the  movement  of  the  user.  The  finger  walker 
appears  to  be  fairly  accurate  when  tracking  the  position  of  the  fingers  across  the  pad,  and  the 
tracking  window  shows  that  the  device  provides  an  effective  means  of  moving  through  a  virtual 
environment.  Some  limitations  of  the  present  device  include  a  lack  of  memory  of  past  system 
events,  and  extreme  sensitivity  of  the  hardware  to  slight  changes  in  the  position  of  the  finger,  as 
well  as  to  noise  in  the  system.  It  is  expected  that  improvements  to  the  system  can  be  introduced 
to  eliminate  these  difficulties,  and  that  the  effectiveness  of  the  “walking”  motion  and  expenditure 
of  effort  on  the  user’s  ability  to  estimate  distances  accurately  in  the  virtual  environment  should 
then  be  subject  to  future  experimental  study. 

For  further  details  on  this  work,  see  Appendix  2  (Fitch,  1998). 


2.3  Room  Scanner 

2.3.1  Progress  on  room  scanning  robot  and  work  on  scanning  rack 

-  -  During  Y2,  we  have- continued  the -development  of-the  room  scanner,  a  novel  device  for 
capturing  environmental  texture  from  the  viewpoint  of  the  VE  participant.  This  is  achieved  using 
an  eye-level  camera  with  wide  angle  (110  degree)  lens  moved  in  precise  increments  in  parallel  to 
the  observed  detail,  with  distances  and  orthogonality  being  preserved  through  the  use  of 
alignment  lasers.  A  software  controller  steps  the  camera  along  a  section  of  wall,  capturing  image 
frames  at  regular  intervals.  A  tiling  tool  then  takes  these  frames  and  pastes  them  together  to 
form  long  strips  of  texture  for  each  wall  section.  These  sections  are  then  attached  to  specific 
polygons  within  the  floorplan,  or  set  as  the  default  texture  for  a  particular  room  or  class  of 
objects.  Once  textures  have  been  associated  with  a  floorplan,  the  resulting  data  is  exported  as  a 
list  of  3D  polygons  and  associated  textures.  These  post-processing  tools  are  further  described  in 
a  later  section. 

In  the  early  part  of  this  project  year,  we  made  steady  progress  on  aspects  relating  to  image 
data  collection,  sensors,  and  software;  however,  a  number  of  problems  arose  relating  to 
mechanical  issues  which  have  led  us  to  (temporarily)  modify  the  form  of  the  scanning  device. 

One  of  the  major  successes  has  been  that  of  integrating  the  QuickCam  image  capture 
hardware  with  the  QuickTime  multimedia  storage  and  compression  format.  Within  the 
standardized  API  environment  provided  by  Apple  and  the  adherents  to  the  QuickTime  standard 
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(such  as  Connectix,  the  producers  of  QuickCam),  we  have  been  able  to  put  together  our  image 
capture  software  in  a  way  that  preserves  the  refinement  of  the  Macintosh  Interface,  and  also 
maintains  compatibility  with  future  upgraded  versions  of  any  of  the  component  pieces.  In 
addition  to  being  able  to  use  the  graphical  image  compression  and  storage  aspects  of  the 
QuickTime  format,  we  have  included  acoustic  impulse  response  information  keyed  to  location 
and  coordinate  information  from  floorplans.  These  files  are  directly  browsable  using  QuickTime 
players  available  on  many  platforms,  and  serve  as  a  stream  of  "raw  data"  for  our  VE-generating 
post-processing  tools. 

We  have  encountered  a  number  of  mechanical  difficulties  with  the  mobile  robot-based 
room  scanner  on  which  we  began  working  last  year.  These  difficulties  arise  primarily  from 
irregularities  in  real-world  floor  surfaces.  Even  in  the  relatively  clean  environment  of  our 
laboratory,  the  navigation  issues  arising  in  controlling  motion  over  irregular  floor  surfaces  are 
not  trivial.  We  initially  expected  that  open-loop  control  of  motion  along  a  straight  line  would 
yield  acceptable  performance  for  initial  tests,  but  we  quickly  discovered  that  even  small  drifts  in 
the  motion  would  result  in  unacceptable  distortions  of  the  incoming  texture  data.  We  had  hoped 
to  be  able  to  delay  work  on  the  laser  guidance  part  of  the  system  until  after  some  of  the  more 
basic  issues  were  finished,  but  this  became  impossible.  Therefore,  in  a  parallel  effort,  we  began 
working  out  the  guidance  routines  using  a  separate  small  robot  purchased  for  the  purpose.  The 
result  of  this  effort  was  a  set  of  routines  for  tracking  a  laser  using  either  a  linear  photodiode 
array,  or  a  more  general  routine  for  tracking  the  laser  using  a  reflected  image  of  the  laser  as 
detected  by  the  CCD  camera  acquiring  the  texture  images. 

Ultimately,  we  would  like  to  utilize  this  work  to  guide  a  room-scanning  robot,  but  in  our 
experiments  we  have  seen  that  the  tracking  routine  introduces  its  own  motion  anomalies,  which, 
when  combined  with  some  of  the  secondary  mechanical  considerations  of  the  mobile  robot 
(maintenance  of  precise  verticality,  vibration,  etc.)  lead  us  to  put  the  approach  on  hold  in  favor  of 
a  simplified  but  very  practical  form  of  the  room  scanner  (involving  a  scanning  rack). 

As  an  outcome  of  software  development  for  managing  the  texture  acquisition  process,  we 
found  it  useful  to  describe  the  texture  records  not  as  arbitrarily-long  ribbons  but  rather  as  - 
standard  blocks  of  nominally  5  foot  length.  This  description  arose  from  consideration  of  both 
filing  organization  issues  as  well  as  issues  involving  the  ultimate  application  of  the  textures  to 
polygons  within  a  real-time  VE  application.  In  considering  the  requirements,  namely,  that  of 
very  precise  orthonormal  image  scanning  throughout  the  block  length,  we  decided  to  put  together 
a  scanning  rack  to  provide  a  controlled  path  for  the  camera  to  travel  along.  This  approach, 
although  in  some  ways  superficially  similar  to  taking  photographs  of  each  block  with  a  normal 
camera,  still  preserves  the  "viewpoint-free"  perspective  that  allows  the  edges  of  each  block  to  be 
seamtessly  stitched  (as  described  in'the'Yl  report)  without  viewpoint  irregularities. . 

The  layout  of  the  rack-based  system  is  fairly  simple.  A  carriage  containing  the  stepper 
motor  and  camera  is  mounted  on  a  toothed  rack  which  is  suspended  between  a  pair  of  supports. 
The  rack  is  positioned  parallel  to  the  surface  (wall)  to  be  scanned  using  one  of  a  number  of 
possible  positioning  aids  (string,  stick,  crossed  laser  beams).  After  the  camera  traverses  the  span 
of  the  rack,  the  assembly  is  slid  over  to  the  next  position,  in  a  motion  that  is  easy  to  repeat 
consistently.  In  this  way  the  blocks  of  texture  are  captured  one  by  one  for  each  room  and  corridor 
of  the  Venue.'”  . . . . . .  . 
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23.2  Introduction  to  TOADS  (Three-dimensional  Open-ended  Architectural  Database 

System. 

Much  thought  and  effort  in  Y2  has  been  spent  addressing  the  major  bottleneck  in  the 
creation  of  complex  VE  simulations:  the  construction  of  3D  models  and  the  acquisition  of 
photo-realistic  textures  for  those  models.  Moreover,  the  prospect  of  having  to  sort  out  gigabytes 
of  texture  data  and  assign  these  files  to  the  appropriate  polygons  of  a  model  made  us  realize  that 
some  automated  system  for  organizing  and  linking  this  information  would  be  necessary  if  we 
were  to  realize  any  benefit  from  the  large  amount  of  additional  data  collected  with  the  scanner. 

The  automated  VE  Generation  system  simplifies  both  of  these  tasks  by  allowing  users  to 
import  two-dimensional  DXF  floorplan  files  (commonly  available  for  many  installations, 
includingall  of  the  buildings  at  MIT)  and  then  automatically  attach  textures  to  walls,  doors, 
windows,  and  other  objects  within  the  building.  The  tool  can  take  DXF  files,  which  normally 
consist  of  an  uncoordinated  collection  of  lines  and  polygons,  group  them  logically  into  rooms, 
and  classify  each  object  as  one  of  several  possible  object  types  (e.g.  door  or  wall.)  In 
conjunction  with  the  scanner,  the  graphical  interface  allows  straightforward  and  unambiguous 
association  of  particular  scans  with  particular  places  in  the  plan  view. 


3.  OVERVIEW  OF  YEAR  3  WORK  PLAN 

Our  main  effort  during  Y3  will  be  directed  towards  completion  of  the  VE-construction 
system.  Additional  work  will  include  further  analysis  of  the  data  obtained  in  the  experiment 
described  in  Sec.  2.1  above,  preparation  of  this  material  for  publication,  initiation  of  a  “white 
paper”  on  VE-assisted  spatial  training  for  use  by  ONR  in  their  program  planning,  and 
preparation  of  a  renewal  proposal  to  obtain  funds  for  continuing  this  work.  Two  major  foci  of 
our  planned  future  work  are  (1)  the  development  of  a  VE  training  system  that  includes  Virtual 
Worlds  iirMiniature  (WIMs)  to  serve_as  3D  maps  and  (2)  the  development  of  VE-assisted 
methods  for  assessing  basic  spatial  skills  and  abilities  and  for  enhancing  these  skills  and  abilities. 


4.  APPENDICES 


Appendix!:- Glenn  Koh:  Training  Spatial  Knowledge  Acquisition  Using  Virtual  Environments. 
Master’s  Thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  1997. 


Appendix  2:  Sanford  Brian  Fitch:  “The  Finger  Walker:  A  Method  to  Navigate  Virtual 

Environments.  Master’s  Thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  1998. 
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